Data Visualisation in Python

Government Analysis Function and Data Science Campus Logos.

This is a HTML document. The Introduction to Python course is written and intended to be used in a Jupyter Notebook file. These HTML documents have been made available for users who require screen readers or other accessibility needs. These HTML documents have been tested, but if you notice any errors or any compatibility issues please contact us on the GSS Capability email inbox.

If you are using a screen reader you will need to set your punctuation level (sometimes called verbosity) to full, especially for the code sections.

Chapter 3 – Creating a Variety of Plot Types

Chapter Overview

  • Packages and Data

  • Continuous Data

    • Histogram
  • Discrete

    • Bar (Frequency)
  • Continuous X, Continuous Y *Scatter Plots Revisited

  • Continuous Functions

    • Line Plots
  • Discrete X, Continuous Y

    • Bar Charts Revisited
    • Box Plots

1 Packages and Data

Let’s start, as always by loading our packages and our data.

We’re using:

  • Numpy – Version 1.12.1
  • Pandas – Version 0.20.1
  • Matplotlib – Version 2.0.2
  • Seaborn - Version 0.7.1

Remember you can use the .__version__ attribute (e.g np.__version__ )to check your version.

More information about the packages is given in Chapter 1.

We’re following standard convention for nicknames, and we’ll also load the gapminder data.

 # Load packages
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from matplotlib import rcParams # To set our defaults

# Load data
gapminder = pd.read_csv("../data/gapminder.csv")

We’ll also use the magic command

% matplotlib inline

This means any plot we create will be automatically embedded below the code cell once the code has been executed.

#% matplotlib inline

When we load Seaborn it uses its style as the default one; overriding Matplotlib.

Here I’m setting the default style to ticks which is the most similar to our Matplotlib default.

sns.set_style("ticks")

Here we’ll also bring in those default values we talked about in the last chapter.

# Set Default Fonts
rcParams["font.family"] = "sans-serif"
rcParams["font.sans-serif"] = ["Arial", "Tahoma"]

# Set Default font sizes
small_size = 12
medium_size = 14
bigger_size = 16

# Change the font size for individual elements
matplotlib.rc("font", size=small_size)          # controls default text sizes
matplotlib.rc("axes", titlesize=small_size)     # fontsize of the axes title
matplotlib.rc("axes", labelsize=medium_size)    # fontsize of the x and y labels
matplotlib.rc("xtick", labelsize=small_size)    # fontsize of the tick labels
matplotlib.rc("ytick", labelsize=small_size)    # fontsize of the tick labels
matplotlib.rc("legend", fontsize=small_size)    # legend fontsize
matplotlib.rc("axes", titlesize=medium_size)    # title fontsize

As a reminder visualisation code can get lengthy quickly.

A lot of these visualisations will only have one or two new concepts, the other code will be things covered previously.

To make the code clearer we will be using lots of comments. In Python these look like this:


# This is a comment

As we get into more complicated visualisations new concepts will have the word NEW - at the start of the comment e.g:


#  NEW - Set X Axis

We will also refer to line numbers to describe content.

If you are using Jupyter Notebook, you can turn on line numbers in the View Menu -> Toggle Line Numbers

2 Continuous Data

2.1 Histogram

We create histograms in Matplotlib using axes.hist().

Histograms take one argument, the X axis value.

In a histogram each column represents a group of a continuous, quantitative variable, people often confuse these with bar charts; where each column represents a group defined by a categorical variable.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.hist(x = gapminder["life_exp"])

# Add Labels, Title and Captions
plt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.03, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")

# Set xlim - the lowest x value - to 0
axes.set_xlim(0)

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

2.1.1 Changing bin sizes

The default number of groups, called bins is defined by rcParam, and in my version is 10.

To get finer control we can change the bins = parameter inside our axes.hist(). Here I’m setting my number of bins to 40.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.hist(gapminder["life_exp"],
          bins = 40) # NEW - Change the bin size

# Add Labels, Title and Captions
plt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set xlim to 0
axes.set_xlim(0)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

We can of course use range() or np.arrange() to set our bins programmatically. More information on the np.arrange() function can be found in the numpy documentation. In the example below 10 bins (“groups”) are being created that span between 0 and the maximum value in life expectancy + 1.

You might find it helpful to run the code below to view the bin ranges before applying them to the plot.

np.arange(start = 0, stop = (gapminder[“life_exp”].max() + 1), step = 10) )

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.hist(gapminder["life_exp"],
          bins = np.arange(start = 0,
                           stop = (gapminder["life_exp"].max() + 1),
                           step = 10) ) # NEW - Change the bin size

# Add Labels, Title and Captions
plt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set xlim to 0
axes.set_xlim(0)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

Have an experiment at changing the number of bins yourself.

We can create a horizontal histogram by setting the orientation = “horizontal” in our axes.hist().

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.hist(gapminder["life_exp"],
          bins = 15, # Change the bin size
          orientation = "horizontal")

# Add Labels, Title and Captions
plt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

2.1.2 Colours

We can use the same colour formats here as we explored in Chapter 2.

We won’t be going over all the details again in these subsequent sections; but will just be showing you an example or two with different methods.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.hist(gapminder["life_exp"],
          bins = 15, # Change the bin size
          color = "orange")

# Add Labels, Title and Captions
plt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

We can also set the edge colour for the histogram by using edgecolor = .

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.hist(gapminder["life_exp"],
          bins = 15, # Change the bin size
          color = "orange", # Set the colour of the bar
          edgecolor = "black") # NEW - Set the colour of outside of each bar

# Add Labels, Title and Captions
plt.suptitle("Frequency of Life Expectancy at Birth", horizontalalignment = "right")
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Life expectancy at birth in years ")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3 Discrete Data

3.1 Bar (Frequency) Chart

In this section we’ll look at bar charts as they relate to frequency of the data. Later, we’ll look at bar charts with a continuous y and discrete x.

Firstly I’m going to create a new column; if the value in gdp_per_cap is greater than or equal to the mean of the gdp_per_cap column the new column (above_mean_gdp) will have a 1, if not it will have a 0.

I’ll then filter that for 1992; and use a group_by to find the number of countries above and below the mean.

# Add in a new column - 1 if greater than or equal to mean GDP of the total data. 0 if less than total mean GDP of data.

gapminder["above_mean_gdp"] = (gapminder["gdp_per_cap"] >= gapminder["gdp_per_cap"].mean()).astype("int64")


# Create a new dataset just for 1992
gapminder_1992 = gapminder[gapminder["year"] == 1992] 


# Group by this new column and do a count.
mean_gdp_92 = gapminder_1992.groupby("above_mean_gdp")["country"].count()

# Look at the Data
mean_gdp_92
above_mean_gdp
0    90
1    52
Name: country, dtype: int64

I can then plot this data using axes.bar():

First we specify the data we want to plot.

In earlier versions of Matplotlib this takes the parameter left = , this was changed in more modern versions to be x =.

For compatibility issues we’ve not included the parameter, but we highly advise you find out which parameter your version uses and apply that, parameters can make your code easier to read.

The height= parameter gives the height of our bars. The tick_label = parameter takes the values specified and applies them as tick labels.

As the groupby() gave us the values of 0 and 1 as the index we set our data (e.g left = or x=) to mean_gdp_92.index, and our height is just mean_gdp_92, as we don’t have any other columns.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(mean_gdp_92.index,  # The X axis is the index values
         height = mean_gdp_92,
         tick_label = ["Below Mean", "Above Mean"])


# Add Labels, Title and Captions
plt.suptitle("Countries with below and above mean GDP", x = 0.3)
plt.title("1992 Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Countries")
axes.set_ylabel("Number of Countries")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.1 Plotting Categorical Variables

Below I’m preparing my data. I’m using a groupby to return the number of countries in each continent. I’m using the parameter here

as_index = False

If we don’t set this parameter our continents will become our index.

Usually this is fine – however when plotting bar charts Matplotlib won’t let us use text columns for our X axis; this includes an index that is text.

By having a numerical index we can create our bar chart and then use column with the continents to display our tick labels.

number_countries = gapminder_1992.groupby(by = "continent", as_index= False)["country"].count()

number_countries.rename(columns = {"country": "num_countries"}, inplace= True) # Change the column name to be more descriptive

number_countries
  continent  num_countries
0    Africa             52
1  Americas             25
2      Asia             33
3    Europe             30
4   Oceania              2

As mentioned earlier left = only accepts scalar values. Here we’re using the index of the DataFrame.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(number_countries.index,  # The X axis is the index values as these are numbers
         height = number_countries["num_countries"]);

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

We can add the names of the continent back by using the tick_label = parameter; and setting this to the continent column.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(number_countries.index,  # The X axis is the index values
         height = number_countries["num_countries"],
         tick_label = number_countries["continent"]); # NEW -Set the tick labels to be the continent names, not just numbers.

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.1.1 Seaborn

Seaborn makes it much easier to create this bar chart.

To match the other visualisation I need to sort my gapminder data by continent - the default order for bars is not alphabetical.

The x axis is set to be the continent column, the y axis to be the life expectancy column (it will automatically plot the mean of the column) and my data as the sorted dataframe gapminder_sort_continent.

By default the sns.barplot() has confidence interval bars on; we turn those off by setting ci = False

estimator = is by default the mean; but a different argument here can be passed, such as the count (using np.count_nonzero) as done here.

bar_plot = sns.barplot(x="continent", y = "life_exp" , data=gapminder_1992,
                       ci = None, # Removes confidence interval bars
                       estimator = np.count_nonzero); # Uses the count - mean is default

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Count")


# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.2 Ordered Bar Charts

Good practice is often to sort our bar chart values in ascending or descending order. Which order depends on what we want to highlight – the smallest or the largest values.

Here I’m creating a new dataframe with the ascending values. Note that it’s really important to reset the index here – after all we are using the index to state what order we want our bars on the chart.

# Create a new dataframe - where the number of continents is sorted in ascending order.
asc_number_countries = number_countries.sort_values(by = "num_countries").reset_index()
asc_number_countries.head()
   index continent  num_countries
0      4   Oceania              2
1      1  Americas             25
2      3    Europe             30
3      2      Asia             33
4      0    Africa             52
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(asc_number_countries.index,  # The X axis is the index values
         height = asc_number_countries["num_countries"],
         tick_label = asc_number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

And in descending order we just set ascending = False.

# Create a new dataframe - where the number of continents is sorted in descending order.
desc_number_countries = number_countries.sort_values(by = "num_countries", ascending=False).reset_index()
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(desc_number_countries.index,  # The X axis is the index values
         height = desc_number_countries["num_countries"],
         tick_label = desc_number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Continent")
axes.set_ylabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

For Seaborn, again using pre-sorted data is the simplest way to get your plot to look the way you desire.

I am using the asc_number_countries and the desc_number_countries DataFrames we created earlier.

Note that here Seaborn colours each bar a different colour; as our continents are a broad group our guidelines say they should be the same colour. We’ll cover how to do this in a later section.

# Ascending
bar_plot = sns.barplot(x="continent", y = "num_countries", 
                       data=asc_number_countries,
                       ci=None, # Removes confidence interval bars
                       estimator=np.sum); # Uses the sum

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Count")


# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

And in descending order = here I’ve set the color = parameter to be blue, which applies to all of the bars.

# Descending
bar_plot = sns.barplot(x="continent", y = "num_countries", 
                       data=desc_number_countries,
                       ci=None, # Removes confidence interval bars
                       estimator=np.sum) # Uses the sum

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Count")


# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.3 Horizonal Bar Charts

If tick labels are long a horizontal bar chart is often preferable to rotating tick labels.

A horizontal bar chart is created by using barh

Note the parameters are different than our axes.bar() above

Again, depending on your version of Matplotlib the parameter for the data may be different; older versions use bottom = and later versions use y =. We’ve omitted the parameter for compatibility, but recommend you find which is the correct version for you.

  • width is the values on our x axis; the mean life expectancy.

  • tick_label applies our names to the labels as before.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.barh(number_countries.index,  # The Y axis is the index values
          width = number_countries["num_countries"],
          tick_label = number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.3.1 Seaborn

To create a horizontal bar chart in Seaborn you need to change the x and y axis around; and set the parameter orient = h.

bar_plot = sns.barplot(y="continent", x = "num_countries", 
                       data=number_countries, # note x and y are reversed
                       ci=None, # Removes confidence interval bars
                       estimator=sum, # Uses the count - mean is default
                       orient="h") # Changes the orientation to horizontal

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=35, y= 5.5, s="Source: Gapminder", ha="left")
bar_plot.set_ylabel("Continent")
bar_plot.set_xlabel("Count")


# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.4 Spacing Between Bars

In Matplotlib you can alter the spacing by adjusting the width of the bars using the width parameter.

This takes a scale from 0 (farthest apart) to 1 (touching). The default is 0.8

The example below is an extreme one – it’s definitely not best practice!

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(number_countries.index,  # The X axis is the index values
         height = number_countries["num_countries"],
         tick_label = number_countries["continent"],
         width = 0.2); # Set the tick labels to be the continent names, not just numbers.

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

Again this is also possible in Seaborn, but can require a function to modify the patches attribute.

Help for this is available in several Stack Overflow answers

3.1.5 Colouring Bars

Again, we won’t go through all of the colour options here; as these are covered in chapter 2.

We should use the same colour and shade for categorical data that cannot be organised into broad groups.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(number_countries.index,  # The X axis is the index values
         height = number_countries["num_countries"],
         tick_label = number_countries["continent"], # Set the tick labels to be the continent names, not just numbers.
         color = "CadetBlue"); # NEW - Set colour

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

We’ll discuss multiple colours when we look at bar charts with a continuous y and discrete x later. Our continents are a broad group, so according to guidance it’s not appropriate to colour them individually.

3.1.5.1 Seaborn

Seaborn uses the color parameter.

bar_plot = sns.barplot(x="continent", y = "life_exp", 
                       data=gapminder_1992,
                       ci=None, # Removes confidence interval bars
                       estimator=np.count_nonzero, # Uses the count - mean is default
                       color="blue"); 

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Number of Countries In Each Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -10.5, s="Source: Gapminder", ha="left")
bar_plot.set_ylabel("Continent")
bar_plot.set_xlabel("Count")


# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

3.1.6 Value Labels

We can also create value labels for bars. If you find yourself regularly labelling individual bar values, a table may be a better presentation method. In alignment with the GSS guidelines bar values should be aligned at the bottom of the bars to enable easy comparison.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.barh(number_countries.index,  # The Y axis is the index values
          width = number_countries["num_countries"],
          tick_label = number_countries["continent"]); # Set the tick labels to be the continent names, not just numbers.

# Add Labels, Title and Captions
plt.suptitle("Number of Countries In Each Continent", x = 0.25)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.02)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Continent")
axes.set_xlabel("Count")

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))


# Create value labels programaticaly
# Create a list of the total values to loop over
country_count = number_countries["num_countries"].tolist()
# Loop over the values
for i in range(len(country_count)):
    axes.annotate(country_count[i], # Text is the value of the loop we're currently on
                  xy = (0, i),  # At position 0 on the x axis, and the y position of the current loop
                  color = "white",
                  va = "center") # Set the colour of the text to white

plt.show();

4 Continuous X, Continuous Y

4.1 Scatter Plots Revisited

We covered scatter plots in great detail in chapter 2, however we will revisit some elements here.

As a basic reminder we plot scatter plots like this in:

Matplotlib

# Make our 1987 data again
gm_1987 = gapminder[gapminder["year"] == 1987]
gm_1987.head()
        country continent  year  ...  infant_mortality  fertility  above_mean_gdp
7   Afghanistan      Asia  1987  ...               NaN        NaN               0
19      Albania    Europe  1987  ...              40.8       3.13               0
31      Algeria    Africa  1987  ...              46.3       5.51               0
43       Angola    Africa  1987  ...             134.1       7.20               0
55    Argentina  Americas  1987  ...              27.1       3.06               1

[5 rows x 9 columns]
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)


# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));


# Set Gridlines and colours
# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

Pandas:

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
gm_1987.plot(x = "gdp_per_cap", y = "life_exp",
             kind = "scatter",
             ax = axes) # Plot on the axes

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

4.1.1 Choosing Your Axis

We covered this in chapter 2 – but just to reiterate there are several ways to set the values of the x and y axes. These include: * axes.set(ylim=(0), xlim= (0)) * axes.set_ylim(bottom=0, top = 10000) and axes.set_xlim(left=0) * plt.xlim() and plt.ylim()

If we don’t pass a parameter like top matplotlib will automatically decide for us. We discussed the importance of setting the y axis to start at 0, here I’ve used axes.set_ylim(bottom=0, top = 85) which gives more white space above the top of the data, this can make the points at the top more readable.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"])


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

And Seaborn

plot = sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
                  fit_reg = False )

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16, y = 1.05, x = 0.65)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.01)
plt.text(x= 20000.0, y = -12, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Get axes and change grid lines and colours
axes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)

plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)

plt.ylim(0, None) # Set our Y axis to start from 0 or it "floats")
plt.show();

As of version 0.9.0 Seaborn has a function called sns.scatterplot() at current time of writing (November 2020). These versions might not be available within your government department, as they are quite new. If you can upgrade, you may need to update other packages (e.g numpy and scipy). We recommend working in a virtual environment to do so; especially if you require certain versions of packages for your role. If you want to experiment with non sensitive data, then options like Google Colab will let you experiment with packages without installing to your machine.

For this plot the code will be given as a markdown cells. This is because they won’t run for everyone if they are in a code cell, there will also be an image of the output.

# Plot
plot = sns.scatterplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987)

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)
plot.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16, y = 1.0, x = 0.5)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.01, x = -0.06)
plt.text(x= 20000.0, y = -20, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Note the methods are set_xlabel
plot.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
plot.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

plt.ylim(0, None); # Set our Y axis to start from 0 or it "floats")

Seaborn Scatter Plot

4.1.2 Changing Markers

4.1.2.1 Marker type

All three types of plot use the markers from matplotlib. You can find a table of markers and definitions on the matplotlib website.

Note here the paramater in matplotlib and using .plot() is marker

The paramater in Seaborn is markers

4.1.2.2 Marker size.

This can be achived in matplotlib and .plot() by adding the parameter

s

These are measured in points; like fonts. Each point is equivalent to 1/72 of an inch. While it’s possible some people measure out their ideal point size and apply it; most visualisations generally involve experimenting until something looks “about right”.

The s parameter may work in newer version of Seaborn; however the following parameter will if it does not.

scatter_kws={"s": 72}

Sufficient google searching couldn’t find the specific scale of measurement for these markers; but again advice seemed to follow the “stop when it looks about right” family.

4.1.2.3 Marker colours

We covered colours in depth in chapter 2.

In Matplotlib and .plot() colours can be altered using the color parameter and a variety of methods including:


color = "blue" # Using named or web safe colours)
color = (0, 0.239, 0.349) 
# Using RGB colours, each expressed as a decimal from 1 to 0 
color = "#A0E7E5" # Using hexidacimal codes

Matplotlib:

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
             marker = "x", # Set marker style
             s = 72, # Set Size
             color = (0, 0.239, 0.349)) # Set colour


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")
figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

Pandas:

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
gm_1987.plot(x = "gdp_per_cap", y = "life_exp",
             kind = "scatter", 
             marker = "D", # Set marker style
             s = 40, # Set Marker Size
             color = "cyan", # Set color
             ax = axes) # Plot on the axes

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

For Seaborn we can include colours in the scatter_kws={} dictionary.

This accepts most of the methods shown above, but struggles with RGB values.

The transparancy (or alpha) in Seaborn seems to be set around 80% so don’t worry if your colours don’t come out quite as you’d expect them. We’ll look at how to sort the alpha in the next section.

scatter_kws={“color”: “#4C2C2E”}

Note that we can put both the size and the color arguments in the same dictionary e.g

scatter_kws={“s”: 200, “color”: “#4C2C2E”}

plot = sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
                  fit_reg = False,
                  markers = "*",
                  palette= "red", 
                  scatter_kws = {"s": 200,
                                 "color": "#4C2C2E"})

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16, y = 1.05, x = 0.65)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.01)
plt.text(x= 20000.0, y = -13, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)

plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)

# Get axes and change grid lines and colours

axes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

plt.ylim(0, None) # Set our Y axis to start from 0 or it "floats")
plt.show();

4.1.3 Using Alpha With Large Amounts Of Data

We also talked about alpha in chapter 2. This can be really useful with large amounts of data.

For our Matplotlib and .plot() charts this is as simple as setting the alpha = to a decimal between 0 (transparent) and 1 (solid)

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.scatter(x=gm_1987["gdp_per_cap"], y=gm_1987["life_exp"],
             alpha = 0.5) # Set colour


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

Pandas:

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
gm_1987.plot(x = "gdp_per_cap", y = "life_exp",
             kind = "scatter", 
             alpha = 0.5,
             ax = axes) # Plot on the axes

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

For Seaborn this is set in our scatter_kws{} dictionary using alpha as the key.

Frustratingly in the sns.lmplot() alpha seems to be automatically set to around 80%, bear this in mind when choosing colours!

plot = sns.lmplot(x = "gdp_per_cap", y = "life_exp", data = gm_1987,
                  fit_reg = False,
                  markers = "*",
                  palette = "red", 
                  scatter_kws = {"alpha" : 1, 
                               "color": "#4C2C2E"})

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Graph showing Life Expectancy by GDP per Capita", 
             fontname="Arial", size=16, y = 1.05, x = 0.65)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.01)

plt.text(x= 20000.0, y = -12 , s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Gross Domestic Product per Capita in International Dollars", fontname="Arial", size=12)

plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)

# Get axes and change grid lines and colours

axes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

plt.ylim(0, None) # Set our Y axis to start from 0 or it "floats")
plt.show();

5 Continuous Functions

5.1 Line Plots

In Matplotlib the function axes.plot() produces a line plot.

The data we pass goes in the order x and then y – however we just pass them in order – we don’t pass parameters here like we have done previously.

# Set up some new data to plot
uk_gapminder = gapminder[gapminder["country"] == "United Kingdom"]
# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"])

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

If we also do a scatter plot of the same data we can notice how there’s some smoothing of the line happening. My scatter plot dots are not perfectly sat in the middle of the line.

Note that I’ve not set the y axis to 0 here – just to make it easier to see that the line smoothed.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"])

# Plot the Data
axes.scatter(x = uk_gapminder["year"], y = uk_gapminder["life_exp"],
             color = "red")


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

plt.show();

Panda’s .plot() by default creates a line graph.

Here I’ve specified kind = “line” to make it clear to the user what the output will be if they’re just reading the code.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
uk_gapminder.plot(x = "year", y = "life_exp",
                  kind = "line",
                  ax = axes) # Plot on the axes

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)


# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
axes.legend(loc = "lower right")

plt.show();

Interestingly Seaborn only introduced lineplots from version 0.9.

As mentioned in Scatterplots above for these plots the code will be given as markdown cells. This is because they won’t run for everyone if they are in a code cell, there will also be an image of the output.

5.1.1 Seaborn Relplot

plot = sns.relplot(x = "year", y = "life_exp", data= uk_gapminder,
                   kind = "line")

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, y = 1.06, x = 0.35)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.01, x = -0.13)
          
plt.text(y = -12, x = 1989, s = "Source: Gapminder", ha="left",
            fontname="Arial", size=12)

#Set the X and Y axis Labels
# Note the methods are set_xlabels -with a s!
plot.set_xlabels("Year", fontname="Arial", size=12)
plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)

# Get axes and plot grids
axes = plot.ax
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")

Seaborn Relplot

5.1.2 Seaborn Lineplot

# Lineplot will plot onto a figure and axes object - giving more fine controll.
figure, axes = plt.subplots(figsize=(6, 5))

# Create the plot
plot = sns.lineplot(x = "year", y = "life_exp", data = uk_gapminder)

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, y = 1.0, x = 0.35)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.04, x = -0.13)
plt.text(y = -12, x = 1989, s = "Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Note the methods are set_xlabel here -
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Get axes and plot grids
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")

Seaborn Relplot

5.1.3 Plotting Through The Origin

As we’ve previously covered, good practice is to plot through the origin. As mentioned above there are several ways to do this.

Note the visualisation on the left; where the y axis has not been set to 0 appears to be a much steeper curve than the one on the right where the axis has been adjusted. This could be considered misleading.

# Create our figure and our axes
figure, (axes1, axes2) = plt.subplots(1, 2, figsize=(10, 4))

### Axes 1 ###
# Plot the Data for ax 1
axes1.plot(uk_gapminder["year"], uk_gapminder["life_exp"])

# Set x and y axis labels
axes1.set_xlabel("Year", fontname="Arial", size=12)
axes1.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes1.set_title("Automatically imputed Y axis", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes1.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes1.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes1.set_frame_on(False)
axes1.set_axisbelow(True)

### Axes 2 ###
# Plot the Data for ax 2
axes2.plot(uk_gapminder["year"], uk_gapminder["life_exp"])

# Set x and y axis labels
axes2.set_xlabel("Year", fontname="Arial", size=12)
axes2.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
axes2.set_title("With Y axis set to 0", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes2.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes2.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes2.set_frame_on(False)
axes2.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes2.set_ylim(bottom=0, top = 85)   


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.3, y = 1.05)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", x = -1.25, y = 1.09)

figure.text(x=0.75, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

plt.show();

5.1.4 Line Styles

Matplotlib Linestyles include

[‘solid’, ‘dashed’, ‘dashdot’, ‘dotted’ | '-', '--', '-.', ':', 'None' ]

More details can be found in the matplotlib documentation pages.

Some of these are duplicates – “dotted” and “:” produce an identical result. Note again that “dotted” makes more sense to humans reading code than “:” does.

The same arguments also work for our .plot() visualisation.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"],
          linestyle = "dotted") # NEW set a linestyle

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

The same arguments also work for our .plot() visualisation.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
uk_gapminder.plot(x = "year", y = "life_exp",
                  kind = "line",
                  ax = axes, # Plot on the axes
                  linestyle = "dashed" ) # NEW - Set a linestyle.

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
axes.legend(loc = "lower right")

plt.show();

5.1.5 Line Colours

We can set colours the same ways as we’ve seen before. Here we’ll just show a few visualisations for demonstration purposes.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.plot(uk_gapminder["year"], uk_gapminder["life_exp"],
          color = "#235956") # NEW set a colour

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
plt.show();

The same arguments also work for our .plot() visualisation.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
uk_gapminder.plot(x = "year", y = "life_exp",
                  kind = "line",
                  ax = axes, # Plot on the axes
                  linestyle = "dashed", # NEW - Set a linestyle.
                  color = (0.076, 0.808, 0.84 ))

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year for UK", 
             fontname="Arial", size=16, x = 0.43)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left")

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)
axes.legend(loc = "lower right")

plt.show();

5.1.6 Plotting Multiple Lines

As we saw in chapter 2 with scatter plots we can plot multiple lines on the same axis.

5.1.6.1 Matplotlib

This can be something as simple as creating multiple ax.plot() objects.

You can also consider labeling the lines rather than, or in additon to, an index.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data Line for UK
axes.plot(gapminder[gapminder["country"] == "United Kingdom"]["year"],  # Our "x"
          gapminder[gapminder["country"] == "United Kingdom"]["life_exp"], # Our "y"
          color = "#003D59", # Set colour
          label = "United Kingdom" ) # Set label

axes.annotate("United Kingdom",
              xy = (gapminder[gapminder["country"] == "United Kingdom"]["year"].max() + 0.3,
                   gapminder[gapminder["country"] == "United Kingdom"]["life_exp"].max()),
              color = "#003D59") # Set colour)


axes.plot(gapminder[gapminder["country"] == "Malawi"]["year"],  # Our "x"
          gapminder[gapminder["country"] == "Malawi"]["life_exp"], # Our "y"
          color = "#A8BD3A", # Set a colour
          label = "Malawi")

axes.annotate("Malawi",
              xy = (gapminder[gapminder["country"]== "Malawi"]["year"].max() + 0.3, #x + little pad
                   gapminder[gapminder["country"]== "Malawi"]["life_exp"].max()), #y
              color = "#A8BD3A") # Set colour)          

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by Year", 
             fontname="Arial", size=16, x = 0.43, y = 1.05)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.1, x = -0.025)

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Set Gridlines and colours
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)
axes.set_axisbelow(True)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)

# Customise and show the Legend
axes.legend(bbox_to_anchor=(0., 1.0, 1.0, .102), loc="lower left",
            ncol=2, 
            mode="expand", 
            borderaxespad=0,
            prop={"family": "Arial", "size":12})

plt.show();

As we’ve seen before we can also use a loop to plot multiple values. Caution should be taken here to not overload a line plot with data and create a “spaghetti plot”.

Plotting a line for each of the values in country (142) or even for the countries in Europe (30) would be far too many. For these plots we’ll plot the mean life expectancy per year by continent. To do this group by continent and year, and find the mean life expectancy. .reset_index() resets the index to start from 0 again.

continent_year_life_exp = gapminder.groupby(["continent", "year"])["life_exp"].mean().reset_index()
continent_year_life_exp # View the data
   continent  year   life_exp
0     Africa  1952  39.135500
1     Africa  1957  41.266346
2     Africa  1962  43.319442
3     Africa  1967  45.334538
4     Africa  1972  47.450942
5     Africa  1977  49.580423
6     Africa  1982  51.592865
7     Africa  1987  53.344788
8     Africa  1992  53.629577
9     Africa  1997  53.598269
10    Africa  2002  53.325231
11    Africa  2007  54.806038
12  Americas  1952  53.279840
13  Americas  1957  55.960280
14  Americas  1962  58.398760
15  Americas  1967  60.410920
16  Americas  1972  62.394920
17  Americas  1977  64.391560
18  Americas  1982  66.228840
19  Americas  1987  68.090720
20  Americas  1992  69.568360
21  Americas  1997  71.150480
22  Americas  2002  72.422040
23  Americas  2007  73.608120
24      Asia  1952  46.314394
25      Asia  1957  49.318544
26      Asia  1962  51.563223
27      Asia  1967  54.663640
28      Asia  1972  57.319269
29      Asia  1977  59.610556
30      Asia  1982  62.617939
31      Asia  1987  64.851182
32      Asia  1992  66.537212
33      Asia  1997  68.020515
34      Asia  2002  69.233879
35      Asia  2007  70.728485
36    Europe  1952  64.408500
37    Europe  1957  66.703067
38    Europe  1962  68.539233
39    Europe  1967  69.737600
40    Europe  1972  70.775033
41    Europe  1977  71.937767
42    Europe  1982  72.806400
43    Europe  1987  73.642167
44    Europe  1992  74.440100
45    Europe  1997  75.505167
46    Europe  2002  76.700600
47    Europe  2007  77.648600
48   Oceania  1952  69.255000
49   Oceania  1957  70.295000
50   Oceania  1962  71.085000
51   Oceania  1967  71.310000
52   Oceania  1972  71.910000
53   Oceania  1977  72.855000
54   Oceania  1982  74.290000
55   Oceania  1987  75.320000
56   Oceania  1992  76.945000
57   Oceania  1997  78.190000
58   Oceania  2002  79.740000
59   Oceania  2007  80.719500

Now we have just 5 lines to plot, but still can get an overview of our data.

# Set up the arguments for the loop
continents = continent_year_life_exp["continent"].unique().tolist() # Create a list with each unique value in continent column
continents.sort() # Sort this list in alphabetical order
my_palette = ["#342040", "#865CA1", "#66B2C4" ,"#98C8D4", "#A9E8A6"] # Create a list of colours I want

# zips together our continent names and pallete colours into one list of tuples
continent_palette = zip(continents, my_palette) 

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Loop through each continent and each colour in turn
for continent_name, colour in continent_palette: # New now has for 3 things not 2
    # Get only the data for the continent we are looking at
    continent_rows = continent_year_life_exp[continent_year_life_exp["continent"] == continent_name]
    
    axes.plot(continent_rows["year"], 
              continent_rows["life_exp"],
              c=colour, # Use corresponding colour from continent_palette
              label=continent_name) # Gives each continent a label for our legend

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year", 
             fontname="Arial", size=16, x = 0.43, y = 1.05)
plt.title("Mean life expectancy per continent" , 
          fontname="Arial", size=12, loc="left", y = 1.1, x = -0.025)

figure.text(x=0.65, y=-0.01, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)
axes.set_xlabel("Year", fontname="Arial", size=12)
axes.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)

# Set the Y axis lower limit to 0
axes.set_ylim(bottom=0, top = 85)

# Gridlines
axes.grid(b = True , which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
#axes.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Customise and show the Legend
plt.legend(bbox_to_anchor=(1.05, 1), 
           loc=2, 
           borderaxespad=0,
           frameon=False, # Turn off border around legend
           prop={"family": "Arial", "size":12})

plt.show();

We can also use our multiplot method here:

# NEW - # Set up the arguments for the loop
continents = continent_year_life_exp["continent"].unique().tolist() # Create a list with each unique value in continent column
continents.sort() # Sort this list in alphabetical order
my_palette = ["#335C67" ,"#F5D000", "#E09F3E", "#CC6163", "#540B0E"] # Create a list of colours I want

# Create our figure and our axes
figure, axes = plt.subplots(nrows= 3,
                            ncols=2, figsize=(15, 15))
figure.subplots_adjust(hspace=0.5)

# zips together our axes, continent names and pallete colours into one list of tuples
continent_palette = zip(axes.flatten(), continents, my_palette) 

# Loop through each continent and each colour in turn
for each_ax, continent_name, colour in continent_palette:
        
    # Get only the data for the continent we are looking at
    continent_rows = continent_year_life_exp[continent_year_life_exp["continent"] == continent_name]

    each_ax.plot(continent_rows["year"], 
                 continent_rows["life_exp"],
                 c=colour)# Use corresponding colour from continent_palette
    
    each_ax.set_title(label = continent_name, loc = "left")
    each_ax.set_xlabel("Year", fontname="Arial", size=12)
    each_ax.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
    each_ax.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
    each_ax.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))
    each_ax.set_frame_on(False)
    each_ax.set_axisbelow(True)
    # Set the x and y lims to be the same for each vis - or we loose scale.
    each_ax.set_ylim(bottom=0, top = ( continent_year_life_exp["life_exp"].max() + (continent_year_life_exp["life_exp"].max() / 10) ))



# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year", 
             fontname="Arial", size=16, y = 0.93, x = 0.25)
figure.text(x=0.12, y= 0.9, s="Mean life expectancy per continent", ha="left",
            fontname="Arial", size=14)
figure.text(x=0.8, y= 0.1, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Remove the empty plot at the bottom right
figure.delaxes(axes[2][1]) # Rows and Cols - starts at 0

plt.show();

This code allows us to create a chart for multiple distributions. This allows us to plot all of the lines, highlighting one line (in this case continent) per plot.

Here we’ve looped using the enumerate method, which is useful as it also gives an index to each iteration of the loop.

With enumerate we need to loop over the same amount of items as subplot axes we’re creating (here 6). That’s why we’ve needed to add an empty value to the continents list, and to our my_pallete and greys lists.

# Set Up
continents = continent_year_life_exp["continent"].unique().tolist()
# Create a list with each unique value in continent column
continents.sort() # Sort this list in alphabetical order
continents.append(" ") 
# We have to plot 6 subplots, even though we have 5 data sets - this adds a blank ``` at the end.

# This will be the colour of the highlighted continetn
bold_highlight = "#203C89"
# Create our figure and our axes
figure, axes = plt.subplots(nrows= 3, ncols=2, figsize=(15, 15))  # Create a grid that's 3 x 2 (6 subplots)
figure.subplots_adjust(hspace=0.5)

# Loop through each subplot axes in turn
for index, each_ax in enumerate(axes.flatten()):
    
    greys = ["#999999", "#999999", "#999999", "#999999", "#999999", "#999999"] 
    # This needs to match the number of axes; even though there's no data on the last one
    greys[index] = bold_highlight
    # Matches our grey colours with our my_pallete colours
    
    continent_palette = zip(continents, greys) 
        # Loop through each continent and each colour in turn
    for continent_name, colour in continent_palette: # New now has for 3 things not 2
        # Get only the data for the continent we are looking at
        continent_rows = continent_year_life_exp[continent_year_life_exp["continent"] == continent_name]

        each_ax.plot(continent_rows["year"], 
                 continent_rows["life_exp"],
                 c=colour) # Use corresponding colour from continent_palette
        
    # Set our titles, labels, grid, ylimits etc
    each_ax.set_title( label = continents[index], loc = "left")
    each_ax.set_xlabel("Year", fontname="Arial", size=12)
    each_ax.set_ylabel("Life expectancy at birth in years", fontname="Arial", size=12)
    each_ax.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
    each_ax.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))
    each_ax.set_frame_on(False)
    each_ax.set_axisbelow(True)
    # Set the x and y lims to be the same for each vis - or we loose scale.
    each_ax.set_ylim(bottom=0, top = continent_year_life_exp["life_exp"].max() 
                                      + (continent_year_life_exp["life_exp"].max() / 10))


# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year", 
             fontname="Arial", size=16, y = 0.93, x = 0.25)
figure.text(x=0.12, y= 0.9, s="Mean life expectancy per continent", ha="left",
            fontname="Arial", size=14)
figure.text(x=0.8, y= 0.1, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

# Remove the empty plot at the bottom right
figure.delaxes(axes[2][1]) # Rows and Cols - starts at 0

plt.show();

We can also plot multiple lines on one plot in Seaborn; using similar parameters as the multi plot we did for scatter.

Here we have a line for each continent on the same visualisation.


plot = sns.relplot(x = "year", y = "life_exp", data= continent_year_life_exp,
                   kind = "line",
                   hue = "continent")

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year", 
             fontname="Arial", size=16, y = 1.06, x = 0.35)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, loc="left", y = 1.01, x = -0.13)
plt.text(y = -12, x = 1989, s = "Source: Gapminder", ha="left",
            fontname="Arial", size=12)

#Set the X and Y axis Labels
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Year", fontname="Arial", size=12)
plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)

# Set gridlines
axes = plot.ax
axes.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
axes.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))


# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")

Seaborn Relplot

We can also multiplots in Seaborn; using similar parameters as the multi plot we did for scatter.

plot = (sns.relplot(x = "year", y = "life_exp", data= continent_year_life_exp,
                    kind = "line",
                    col = "continent",
                    col_wrap = 2)
        .set_titles("")  # Without this we also end up with a central title!
        .set_titles("{col_name}", loc = "left") ) # Without this the titles are "continent = Asia" etc

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions
plt.suptitle("Graph showing Life Expectancy by year", 
             fontname="Arial", size=16, y = 1.02, x = 0.21)
plt.text(x = 1942, y = 291, s = "For each continent", fontname="Arial", size=14 ) 
# When I use "title" here it removes the "Oceania" label
plt.text(x= 2050, y=-18, s="Source: Gapminder", ha="left",
            fontname="Arial", size=12)

#Set the X and Y axis Labels
# Note the methods are set_xlabels -with an s!
plot.set_xlabels("Year", fontname="Arial", size=12)
plot.set_ylabels("Life expectancy at birth in years", fontname="Arial", size=12)

# Set gridlines
for index, each_ax in enumerate(plot.axes.flatten()): 
    each_ax.grid(b = True, which = "both", axis = "x", color = "white") # Without this both grid lines are visible (bug?)
    each_ax.grid(b = True, which = "both", axis = "y", color = (0.745, 0.745, 0.745))


# Set the Y axis lower limit to 0 and upper to 90 to give space.
plt.ylim(0, 90); # Set our Y axis to start from 0 or it "floats")

Seaborn Relplot

6 Discrete X, Continuous Y

6.1 Bar Charts Revisited

A lot of visualisations in this chapter are like the bar chart section from earlier in the chapter.

The main difference is here we are specifying both an X and a Y variable, and previously we only specified a Y variable.

As a result we won’t be covering things like:

  • categorical variables
  • horizontal bar charts
  • colouring bars
  • bar spacing
  • Setting our X and Y limits
  • Setting value labels

In this section again, the methods from earlier will work here too. Please feel free to revisit that section if you need help.

We’ll create some mean life expectancy data, by grouping by continent and finding the mean of the life_exp column.

cont_mean_life_exp = gapminder.groupby(by = "continent", as_index= False)["life_exp"].mean()

cont_mean_life_exp
  continent   life_exp
0    Africa  48.865330
1  Americas  64.658737
2      Asia  60.064903
3    Europe  71.903686
4   Oceania  74.326208

Note here rather than setting height to be the count column, if continuous variable like the life_exp column is used, the bar chart will have a continuous y axis, while retaining the discrete x.

# Create our figure and our axes
figure, axes = plt.subplots(figsize=(6, 5))

# Plot the Data
axes.bar(cont_mean_life_exp.index,  # The X axis is the index values
         height = cont_mean_life_exp["life_exp"], # NEW
         tick_label = cont_mean_life_exp["continent"]) # Set the tick labels to be the continent names, not just numbers.

# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major", )
axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)

# Add Labels, Title and Captions
plt.suptitle("Mean Life Expectancy per Continent", x=0.3)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.12)
figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_xlabel("Countries")
axes.set_ylabel("Mean Life Expectancy")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)



# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

# Create a list of the total values to loop over
country_mean = round(cont_mean_life_exp["life_exp"], 2).tolist() # Rounded here otherwise they're too long
# Loop over the values
for i in range(len(country_mean)):
    axes.annotate(country_mean[i], # Text is the value of the loop we're currently on
                  xy = (i, 1),  # At position 1 on the x axis, and the y position of the current loop
                  ha = "center",  # Ofset the text slightly down to be more cental
                  color = "white") # Set the colour of the text to white

plt.show();

Seaborn

Seaborn again, makes it much easier to create this bar chart.

To match the other visualisation I need to sort my gapminder data by continent.

I can simply set my x axis to be the continent column, the y axis to be the life expectancy column (it automatically assumes I want the mean) and my data as the sorted dataframe gapminder_sort_continent.

By default the sns.barplot() has confidence interval bars on; we turn those off by setting ci = False

gapminder_sort_continent = gapminder.groupby("continent")["life_exp"].mean().reset_index()
gapminder_sort_continent.sort_values("continent").reset_index() # Sort 
bar_plot = sns.barplot(x="continent", y = "life_exp", data=gapminder_sort_continent,
                       ci = None)

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Mean Life Expectancy per Continent", x = 0.30, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047)
bar_plot.text(x=3, y= -15, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Continent")
bar_plot.set_ylabel("Mean Life Expectancy")


# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

6.1.1 Multiple Bars

While this is possible in matplotlib; it is much simpler to do in Seaborn;.

First we’ll create a new DataFrame that shows the total population per continent for each year from 1992 to 2007 (the last year in our data).

pop_continent_after_92 = (gapminder[gapminder["year"] >= 1992]  # Filter for data 1992 onwards
                          .groupby(["year", "continent"], as_index = False) # Group by year and continent, reseting index
                          .agg({"pop" : np.sum}) # Sum of the population
                           )

pop_continent_after_92["pop_millions"] = pop_continent_after_92["pop"] / 1000000

pop_continent_after_92.head()
   year continent           pop  pop_millions
0  1992    Africa  6.590815e+08    659.081517
1  1992  Americas  7.392741e+08    739.274104
2  1992      Asia  3.133292e+09   3133.292191
3  1992    Europe  5.581428e+08    558.142797
4  1992   Oceania  2.091965e+07     20.919651
bar_plot = sns.barplot(x="year", y = "pop_millions" , data = pop_continent_after_92,
                       hue = "continent", # colour the bar according to the continent
                       ci = None)

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Mean population per Continent", x = 0.22, y = 1.05, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047, y = 1.05)
plt.text(x=2.5, y= -900, s="Source: Gapminder", ha="left")
bar_plot.set_xlabel("Year")
bar_plot.set_ylabel("Mean population \n (millions)")

# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
bar_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
bar_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.

plt.show();

Stacked bar charts are again possible in Matplotlib and Seaborn; but our simplest option is to use the .plot() from Pandas as this has the argument stacked = True.

These are commonly used for percentages, or where the bars will add up to the same number.

In this section we’ll be comparing the populations of each continent. These are represented as a percentage of the total for each year.

The plot below uses data from the years 1952, 1972, 1987 and 2002. The differences are not vast; but you can see that Europe’s population as a percentage of the whole has shrunk, and Africa’s has increased.

# Prepare the Data

percentage = (gapminder[gapminder["year"].isin([1952, 1972, 1987, 2002])] # Select the Years
                           .groupby(["year","continent"])["pop"].sum().rename("count") )

percentage = (percentage * 100)/ percentage.groupby(level=[0]).transform("sum") # Makes the column a percentage

percentage = percentage.reset_index()
percentage.pivot("year", "continent", "count").plot(kind="bar" ,stacked = True)


plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.); #Moves the legend to a box outside of the plot.;

# Obtain the axes object so we can manipulate it
axes = plt.gca()

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Population as a % of the years total", x = 0.26, y = 1.05, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047, y = 1.05)
plt.text(x=2.5, y= -30, s="Source: Gapminder", ha="left")
axes.set_xlabel("Year")
axes.set_ylabel("Percentage")

# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")
axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.

plt.show();

These are more commonly seen as a horizontal version. Again; they’re not considered very good examples of visualisation and we would advise using them sparingly.

percentage.pivot("year", "continent", "count").plot(kind="barh" ,stacked = True)

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.); #Moves the legend to a box outside of the plot.;

# Obtain the axes object so we can manipulate it
axes = plt.gca()

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Population as a % of the years total", x = 0.22, y = 1.05, fontname="Arial", size=16,)
plt.title("Data from Gapminder Dataset" , 
          fontname="Arial", size=12, x = 0.047, y = 1.05)
plt.text(x=80, y= -1, s="Source: Gapminder", ha="left")
axes.set_ylabel("Year")
axes.set_xlabel("Percentage")

# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = (0.745, 0.745, 0.745), which = "major")
axes.grid(b = True, axis = "y", c = "white", which = "major")
axes.set_frame_on(False)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745));

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.

plt.show();

6.2 Box Plots

Although not commonly shown in publications we felt it was important to include box plots (and violin plots) as they’re commonly used in exploratory data analysis. As such these are often “less attractive” than previous plots we’ve seen.

# Set up the Data
life_exp = gm_1987[gm_1987["life_exp"].notnull()]["life_exp"]

# Set up the axes and figure
figure, axes = plt.subplots(figsize=(6, 5))

# Plot
axes.boxplot(x = life_exp.values)  

# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")
axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)

# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset", 
          fontname="Arial", size = 12, x = 0.12)

figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("1987")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)

# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

A Pandas version also exists

figure, axes = plt.subplots(figsize=(6, 5))

gm_1987.boxplot(column = "life_exp")

# Obtain the axes object so we can manipulate it
axes = plt.gca()

# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")
axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)

# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset" , 
          fontname = "Arial", size=12, x = 0.12)

figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("1987")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)



# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

And using Seaborn:

# Set up the figure and axes
figure, axes = plt.subplots(figsize=(6,6))

# Plot
box_plot = sns.boxplot(y = "life_exp", data = gm_1987)

# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Life Expectancy", x = 0.22, y = 1., fontname="Arial", size=16,)
plt.title("1987 Data from Gapminder Dataset" , 
          fontname = "Arial", size = 12, x = 0.23, y = 1.05)

plt.text(x=0.2, y= 30, s="Source: Gapminder", ha="left")
box_plot.set_ylabel("Life Expectancy")
box_plot.set_xlabel("1987")

# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
box_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
box_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

6.2.1 Colours

Customisation is quite limited; again this is often an exploratory visualisation rather than explanatory.

This code also works for the Seaborn and the Pandas plots.

# Create the figure an axes
figure, axes = plt.subplots(figsize=(6, 5))

#Plot the visualisation
axes.boxplot(x = life_exp.values,
             boxprops = dict(linestyle="-", linewidth=2, color="b"), # NEW, customise the box
             medianprops = dict(linestyle="-", linewidth=2, color="r"))  # NEW customise the median


# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")
axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)

# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset" , 
          fontname = "Arial", size = 12, x = 0.12)

figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("1987")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)



# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

6.2.2 Multiple Boxes

For multiple box plots we can do a groupby() and use .get_group() to access and plot each group.

This is a less complicated way of plotting a multiple box plot than looping over the boxes; although that is still a valid approach.

# Set up the figure and axis
figure, axes = plt.subplots(figsize=(6, 5))

# remove na values
life_exp = gm_1987[gm_1987["life_exp"].notnull()]

#Plot the visualisation
groups = life_exp.groupby("continent")["life_exp"]

axes.boxplot([groups.get_group("Africa"),
              groups.get_group("Americas"),
              groups.get_group("Asia"),
              groups.get_group("Europe"),
              groups.get_group("Oceania")],
              labels = ["Africa", "America", "Asia", "Europe", "Oceania"],
              medianprops = dict(linestyle="-", linewidth=2, color="r"))


# Set Gridlines and colours
axes.grid(b = True, axis = "x", c = "white", which = "major")
axes.grid(b = True, axis = "y", c = (0.745, 0.745, 0.745), which = "major")
axes.set_frame_on(False)

# Add Labels, Title and Captions
plt.suptitle("Life Expectancy", x = 0.12)
plt.title("1987 Data from Gapminder Dataset" , 
          fontname = "Arial", size = 12, x = 0.12)

figure.text(x=0.65, y=-0.02, s="Source: Gapminder", ha="left")
axes.set_ylabel("Life Expectancy")
axes.set_xlabel("countries")
axes.grid(b = True, which = "major", axis = "y", color = (0.745, 0.745, 0.745))
axes.set_frame_on(False)



# Set Tick Colours to the same grey as our gridlines
axes.xaxis.set_tick_params(color=(0.745,0.745,0.745))
axes.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.show();

We can also do this plot easily in Seaborn by setting hue = “continent”

# Set up the figure and axes
figure, axes = plt.subplots(figsize=(6,6 ))

# Plot
box_plot = sns.boxplot(x = "continent",
                       y = "life_exp", 
                       hue = "continent",
                       data = gm_1987)
        
# Removes the "spines" or edges of our vis
sns.despine(left=True, bottom=True)

# Add Labels, Title and Captions (this comes from Matplotlib!)
plt.suptitle("Life Expectancy", x = 0.22, y = 1., fontname="Arial", size=16,)
plt.title("1987 Data from Gapminder Dataset" , 
          fontname = "Arial", size = 12, x = 0.23, y = 1.05)

plt.text(x=3, y= 30, s="Source: Gapminder", ha="left")
box_plot.set_ylabel("Life Expectancy (years)")
box_plot.set_xlabel("Continent")

# Set Gridlines and colours
bar_plot.grid(b = True , which = "both", axis = "y", color = (0.745, 0.745, 0.745))

# Set Tick Colours to the same grey as our gridlines
box_plot.xaxis.set_tick_params(color=(0.745,0.745,0.745))
box_plot.yaxis.set_tick_params(color=(0.745,0.745,0.745))

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # Moves the legend to a box outside of the plot.

plt.show();

7 End of Chapter

You have completed chapter 4 of the Data Visualisation course. Please move on to chapter 5.